Skip to content

[MRG+1] Integrate loky as default backend#516

Merged
lesteve merged 32 commits intojoblib:masterfrom
tomMoral:loky_integration
Aug 12, 2017
Merged

[MRG+1] Integrate loky as default backend#516
lesteve merged 32 commits intojoblib:masterfrom
tomMoral:loky_integration

Conversation

@tomMoral
Copy link
Copy Markdown
Contributor

@tomMoral tomMoral commented Apr 13, 2017

This PR aims to integrate loky as a backend in joblib. This PR features:

  • Vendoring of loky as joblib.externals.loky
  • Vendoring of cloudpickle as joblib.externals.cloudpickle
  • Fixes the integration of loky.get_reusable executor as a parallel backend.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2017

Codecov Report

Merging #516 into master will increase coverage by 0.43%.
The diff coverage is 96.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #516      +/-   ##
==========================================
+ Coverage   93.73%   94.16%   +0.43%     
==========================================
  Files          37       39       +2     
  Lines        4785     4953     +168     
==========================================
+ Hits         4485     4664     +179     
+ Misses        300      289      -11
Impacted Files Coverage Δ
joblib/numpy_pickle.py 98.47% <ø> (ø) ⬆️
joblib/test/test_numpy_pickle.py 99.04% <100%> (-0.01%) ⬇️
joblib/test/test_parallel.py 95.52% <100%> (+0.3%) ⬆️
joblib/pool.py 91.37% <100%> (-0.26%) ⬇️
joblib/test/test_memory.py 99.36% <100%> (+0.84%) ⬆️
joblib/testing.py 87.5% <100%> (+0.65%) ⬆️
joblib/executor.py 100% <100%> (ø)
joblib/test/test_memmapping.py 100% <100%> (ø)
joblib/parallel.py 98.2% <100%> (+1.16%) ⬆️
joblib/disk.py 80.35% <65%> (-0.04%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b6e44b...7502017. Read the comment docs.

@ogrisel ogrisel changed the title Integrate loky as default backend [WIP] Integrate loky as default backend Apr 14, 2017
@ogrisel
Copy link
Copy Markdown
Contributor

ogrisel commented Apr 14, 2017

This is WIP because we need to deal with the dependencies (loky and cloudpickle). @tomMoral will work on a second PR that sits on top of this one to vendor those two dependencies into a joblib/externals or joblib/vendor folder.

@tomMoral tomMoral force-pushed the loky_integration branch 7 times, most recently from 804255e to 50953d0 Compare May 13, 2017 21:26
@tomMoral
Copy link
Copy Markdown
Contributor Author

I finished vendoring loky and cloudpickle as part of joblib but I did it in the same PR.
I can open 2 separate PR is you want.

Copy link
Copy Markdown
Contributor

@aabadie aabadie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if this PR is marked WIP, I wanted to have a look at it.
For the moment, I only made a code review and have a few comments. Since the CI passes, it should work well.
Do you have some benchmarks compared to multiprocessing/threading ?

Makefile Outdated

test:
pytest joblib

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty line is not necessary

@@ -0,0 +1,357 @@
# -*- coding: utf-8 -*-
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A module docstring is missing. This is not the case in other internal modules.

# done processing this data.
if not os.path.exists(filename):
if self.verbose > 0:
print("Memmapping (shape=%r, dtype=%s) to new file %s" % (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer using .format() with string. This needs to be changed in other places as well.

return (loads, (dumps(a, protocol=HIGHEST_PROTOCOL),))


def delete_folder(folder_path):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind if function that already exists in the disk.py module. Maybe it could be factorize here ?


if np is not None:
# Register smart numpy.ndarray reducers that detects memmap backed
# arrays and that is alse able to dump to memmap large in-memory
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also

# This is a non-regression test for the "Pool seems closed" class of error
params = {'n_jobs': n_jobs, 'pre_dispatch': pre_dispatch,
'batch_size': batch_size}
'batch_size': batch_size, 'verbose': 100}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change really required ?

try:
import cloudpickle # noqa
except ImportError:
# This test pass only when fork is the process start method or
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/pass/passes/


# Make sure that the shared memory is cleaned at the end when we exit
# the context
assert len(os.listdir(tmpdir)) == 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert not os.listdir(tmpdir)

p = Parallel(n_jobs=2, max_nbytes=1, backend=backend)
p(delayed(check_memmap)(a) for a in [np.random.random(10)] * 2)

assert len(os.listdir(tmpdir)) == 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert not os.listdir(tmpdir)

setup.cfg Outdated
--doctest-glob="doc/*.rst"
--doctest-modules
testpaths = joblib
testpaths = joblib/test
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we want to change this.

joblib/disk.py Outdated
err_count += 1
if err_count > RM_SUBDIRS_N_RETRY:
warnings.warn(
"Enable to delete folder {} after {} tentatives."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Enable/Unable/

@aabadie
Copy link
Copy Markdown
Contributor

aabadie commented May 26, 2017

There's still an issue when removing a directory on Windows. It seems that there's still a file open in the directory and Windows prevents that kind of things...

@ogrisel
Copy link
Copy Markdown
Contributor

ogrisel commented Jun 1, 2017

Do you have some benchmarks compared to multiprocessing/threading ?

I ran various tests on a big rackspace machine (sklearn grid searches) and observed no performance regression over multiprocessing.

@tomMoral tomMoral force-pushed the loky_integration branch 3 times, most recently from 3353076 to 653ffc2 Compare June 2, 2017 11:56
Copy link
Copy Markdown
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the [WIP] should be updated to [MRG].

Makefile Outdated

test-no-multiprocessing:
export JOBLIB_MULTIPROCESSING=0 && pytest joblib
export JOBLIB_MULTIPROCESSING=0 && pytest joblib -c setup.cfg
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "-c setup.cfg" is no longer necessary.

@tomMoral tomMoral force-pushed the loky_integration branch from 653ffc2 to 99158ef Compare June 2, 2017 12:05
@ogrisel
Copy link
Copy Markdown
Contributor

ogrisel commented Jun 2, 2017

Please re-run the vendor_cloudpickle.sh script as part of this PR to get the 0.3.1 release that was released yesterday or the day before.

@tomMoral tomMoral force-pushed the loky_integration branch from 8b9d967 to ad7d4e1 Compare June 2, 2017 12:40
@ogrisel ogrisel changed the title [WIP] Integrate loky as default backend [MRG] Integrate loky as default backend Jun 2, 2017
@ogrisel ogrisel changed the title [MRG] Integrate loky as default backend [MRG+1] Integrate loky as default backend Jun 2, 2017
@ogrisel
Copy link
Copy Markdown
Contributor

ogrisel commented Jun 2, 2017

@tomMoral Please add a new test to check that the (default) locky backend accepts nested parallelism (without warnings).

@tomMoral
Copy link
Copy Markdown
Contributor Author

tomMoral commented Jun 2, 2017

@ogrisel : test nested loky with no warning added.
@aabadie : The error with the windows memmaps release have been fixed.
Let me know if you see other parts that need fixes.

@aabadie
Copy link
Copy Markdown
Contributor

aabadie commented Jun 2, 2017

Fine with me. Merge when you want :)

to leave this file unmodified
when running tests from the joblib root foler and joblib is not installed
@ogrisel
Copy link
Copy Markdown
Contributor

ogrisel commented Aug 11, 2017

@lesteve I addressed the remaining comments and rebased to benefit from the merge of #541.

@lesteve
Copy link
Copy Markdown
Member

lesteve commented Aug 12, 2017

OK LGTM, merging, great stuff!

@lesteve lesteve merged commit e7491c8 into joblib:master Aug 12, 2017
@GaelVaroquaux
Copy link
Copy Markdown
Member

GaelVaroquaux commented Sep 4, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants