[MRG+1] NMF speed-up for beta_loss = 0 by hongkahjun · Pull Request #9277 · scikit-learn/scikit-learn

hongkahjun · 2017-07-04T23:07:22Z

Suggestion for speeding up IS divergence in NMF mu update:

WH_safe_X_data **= -1
WH_safe_X_data **= 2

is much faster than

 WH_safe_X_data **= beta_loss - 2

Using line_profiler on ipython to time the lines,

seconds
4363077           WH_safe_X_data **= beta_loss - 2

vs

219524            WH_safe_X_data **= -1
33966             WH_safe_X_data **= 2

test code below:

from sklearn.decomposition.nmf import non_negative_factorization
from sklearn.decomposition.nmf import _multiplicative_update_w
from sklearn.datasets import make_classification
import time
from IPython import get_ipython
import numpy as np

ipython = get_ipython()
np.random.seed(10)
t0 = time.time()
all_samples, all_targets = make_classification(n_samples=1000, n_features=513, n_informative=511,
                                               n_redundant=2, n_repeated=0, n_classes=2,
                                               n_clusters_per_class=1, random_state=0)
all_samples += 5000
ipython.magic(
    "lprun -f _multiplicative_update_w non_negative_factorization(all_samples, n_components=16, solver='mu', beta_loss='itakura-saito', max_iter=100)")

jnothman · 2017-07-04T23:16:48Z

how does **= -2 compare? it is strange to me that you would need to do two **= operations to get the speedup.

…

On 5 Jul 2017 9:07 am, "hongkahjun" ***@***.***> wrote: Suggestion for speeding up IS divergence in NMF mu update: WH_safe_X_data **= -1WH_safe_X_data **= 2 is much faster than WH_safe_X_data **= beta_loss - 2 Using line_profiler on ipython to time the lines, seconds 4363077 WH_safe_X_data **= beta_loss - 2 vs 219524 WH_safe_X_data **= -1 33966 WH_safe_X_data **= 2 test code below: from sklearn.decomposition.nmf import non_negative_factorizationfrom sklearn.decomposition.nmf import _multiplicative_update_wfrom sklearn.linear_model import LogisticRegressionfrom sklearn.datasets import make_classificationimport timefrom IPython import get_ipythonimport numpy as np ipython = get_ipython() np.random.seed(10) t0 = time.time() all_samples, all_targets = make_classification(n_samples=1000, n_features=513, n_informative=511, n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=1, random_state=0) all_samples += 5000 ipython.magic( "lprun -f _multiplicative_update_w non_negative_factorization(all_samples, n_components=16, solver='mu', beta_loss='itakura-saito', max_iter=100)") ------------------------------ You can view, comment on, or merge this pull request online at: #9277 Commit Summary - NMF speed-up for beta_loss = 0 - newest update for NMF speed-up for beta_loss = 0 - updated NMF speed-up for beta_loss = 0 - updated NMF speed-up for beta_loss = 0 File Changes - *M* sklearn/decomposition/nmf.py <https://github.com/scikit-learn/scikit-learn/pull/9277/files#diff-0> (11) Patch Links: - https://github.com/scikit-learn/scikit-learn/pull/9277.patch - https://github.com/scikit-learn/scikit-learn/pull/9277.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9277>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6yAaW5HQ2ohwpEFWVOngTwt7idjPks5sKsWsgaJpZM4ONuom> .

hongkahjun · 2017-07-05T08:35:35Z

Hi

Sorry if I was not clear but

WH_safe_X_data **= -2

yields

4217895     WH_safe_X_data **= -2

Also, not sure why it is much faster, but seems like it has something to do with how numpy calculates powers that are not positive integers.

jnothman · 2017-07-05T09:00:35Z

that seems faster than either option in your original benchmarks, unless I'm not reading correctly

…

On 5 Jul 2017 6:35 pm, "hongkahjun" ***@***.***> wrote: Hi Sorry if I was not clear but WH_safe_X_data **= -2 yields 42178.9 WH_safe_X_data **= -2 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9277 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60rLm_i4goxIzibbcMwsKEONJ67Cks5sK0rYgaJpZM4ONuom> .

hongkahjun · 2017-07-05T09:05:10Z

WH_safe_X_data **= -2 yields 4,217,895
while
WH_safe_X_data **= -1 yields 219,524            
WH_safe_X_data **= 2 yields 33,966

jnothman · 2017-07-05T09:13:07Z

sounds like a numpy issue potentially. what numpy configuration are you using?

…

On 5 Jul 2017 7:05 pm, "hongkahjun" ***@***.***> wrote: WH_safe_X_data **= -2 yields 4,217,895whileWH_safe_X_data **= -1 yields 219,524 WH_safe_X_data **= 2 yields 33,966 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9277 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz60WvKJl4LYjj6wMULGRUq_VJwvyIks5sK1HHgaJpZM4ONuom> .

hongkahjun · 2017-07-05T09:24:03Z

Hi,

I am using 1.11.3.

jnothman · 2017-07-05T09:30:39Z

Numpy.show_config()? And how do those benchmarks look on latest numpy (1.13)?

…

On 5 Jul 2017 7:24 pm, "hongkahjun" ***@***.***> wrote: Hi, I am using 1.11.3. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9277 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6x65jlmyBIstFHwAXdziwLuc250rks5sK1Y1gaJpZM4ONuom> .

TomDLT · 2017-07-05T10:23:25Z

Numpy uses different functions for power internally:

When the exponent is in {-1, 0, 0.5, 1, 2}, it uses respectively {reciprocal, one_like, sqrt, ~identity, square}.
For any other exponent, it uses a much slower routine. This is why a **= 2; a **= -1 is much faster than a **= -2.

A benchmark on a **= b; a **= -1 versus a **= -b gives me (v1.11.3):

(Click on details to show the script)

Details

import numpy as np
from time import time
import matplotlib.pyplot as plt

n_points = int(1e6)
power_range = np.arange(0, 4.1, 0.1)
durations = np.zeros((2, power_range.size))

array = np.random.randn(n_points)
np.abs(array, array)

for i, power in enumerate(power_range):
    array_copy = array.copy()
    start = time()
    array_copy **= -power
    durations[0, i] = time() - start

    array_copy = array.copy()
    start = time()
    array_copy **= power
    array_copy **= -1
    durations[1, i] = time() - start


plt.figure(figsize=(10, 4))
ax = plt.gca()
ax.plot(power_range, durations[0], '-o', label='one operation')
ax.plot(power_range, durations[1], '-o', label='two operations')
ax.set(xlabel='power', ylabel='time', title='Elementwise power in Numpy')
ax.legend()
plt.show()

jnothman · 2017-07-05T10:41:35Z

Thanks, @TomDLT! so it sounds we should at least raise an issue at numpy to handle -.5 and -2. I'm okay with merging this for now, but I'd appreciate a comment referencing fast_scalar_power (or the issue one of us is about to create at numpy).

…

On 5 July 2017 at 20:23, Tom Dupré la Tour ***@***.***> wrote: Numpy uses different functions for power internally <https://github.com/numpy/numpy/blob/b9e3ac9abb6e435cdf6bbe33e0bc894d6a879a53/numpy/core/src/multiarray/number.c#L464> : - When the exponent is in {-1, 0, 0.5, 1, 2}, it uses respectively {reciprocal, one_like, sqrt, ~identity, square}. - For any other exponent, it uses a much slower routine. This is why a **= 2; a **= -1 is much faster than a **== -2. A benchmark on a **= b; a **= -1 versus a **== -b gives me (v1.11.3): [image: figure_1] <https://user-images.githubusercontent.com/11065596/27860284-25fff35c-617c-11e7-8607-c07fb1b7aac1.png> (Click on details to show the script) import numpy as npfrom time import timeimport matplotlib.pyplot as plt n_points = int(1e6) power_range = np.arange(0, 4.1, 0.1) durations = np.zeros((2, power_range.size)) array = np.random.randn(n_points) np.abs(array, array) for i, power in enumerate(power_range): array_copy = array.copy() start = time() array_copy **= -power durations[0, i] = time() - start array_copy = array.copy() start = time() array_copy **= power array_copy **= -1 durations[1, i] = time() - start plt.figure(figsize=(10, 4)) ax = plt.gca() ax.plot(power_range, durations[0], '-o', label='one operation') ax.plot(power_range, durations[1], '-o', label='two operations') ax.set(xlabel='power', ylabel='time', title='Elementwise power in Numpy') ax.legend() plt.show() — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9277 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6840CUZglxqDvaGKWKDCp-973C95ks5sK2QjgaJpZM4ONuom> .

jnothman · 2017-07-05T11:14:52Z

@TomDLT will you report at Numpy?

hongkahjun · 2017-07-05T11:35:28Z

All right, I added a comment stating that code is using numpy's reciprocal function for exponent -1

jnothman · 2017-07-05T11:51:33Z

sklearn/decomposition/nmf.py

        if beta_loss == 1:
            np.divide(X_data, WH_safe_X_data, out=WH_safe_X_data)
+        elif beta_loss == 0:
+            # using numpy's reciprocal function for exponent -1


If you're effectively using np.reciprocal and np.square, you could just do that here...

jnothman · 2017-07-05T11:52:16Z

LGTM

ogrisel

+1 for merge once CI is green.

jnothman · 2017-07-05T12:48:38Z

Thanks @hongkahjun

Hong Kah Jun added 4 commits July 4, 2017 16:52

NMF speed-up for beta_loss = 0

f14c016

newest update for NMF speed-up for beta_loss = 0

e5396dd

updated NMF speed-up for beta_loss = 0

1931683

updated NMF speed-up for beta_loss = 0

f854d0d

TomDLT mentioned this pull request Jul 5, 2017

Speed up __pow__ for exponent -2 and -0.5 numpy/numpy#9363

Closed

comment added

89fbed1

jnothman reviewed Jul 5, 2017

View reviewed changes

jnothman changed the title ~~[MRG] NMF speed-up for beta_loss = 0~~ [MRG+1] NMF speed-up for beta_loss = 0 Jul 5, 2017

comment added

d147594

ogrisel approved these changes Jul 5, 2017

View reviewed changes

jnothman merged commit a8306d4 into scikit-learn:master Jul 5, 2017

massich pushed a commit to massich/scikit-learn that referenced this pull request Jul 13, 2017

[MRG+1] NMF speed-up for beta_loss = 0 (scikit-learn#9277)

4c777b0

dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG+1] NMF speed-up for beta_loss = 0 (scikit-learn#9277)

0e41d40

dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG+1] NMF speed-up for beta_loss = 0 (scikit-learn#9277)

97de5af

NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+1] NMF speed-up for beta_loss = 0 (scikit-learn#9277)

2d7ab31

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] NMF speed-up for beta_loss = 0 (scikit-learn#9277)

b2fec50

AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

[MRG+1] NMF speed-up for beta_loss = 0 (scikit-learn#9277)

2df4b9a

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] NMF speed-up for beta_loss = 0 (scikit-learn#9277)

66d5fcc

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] NMF speed-up for beta_loss = 0 (scikit-learn#9277)

1c232b2

Uh oh!

Conversation

hongkahjun commented Jul 4, 2017 • edited by TomDLT Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 4, 2017 via email

Uh oh!

hongkahjun commented Jul 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 5, 2017 via email

Uh oh!

hongkahjun commented Jul 5, 2017

Uh oh!

jnothman commented Jul 5, 2017 via email

Uh oh!

hongkahjun commented Jul 5, 2017

Uh oh!

jnothman commented Jul 5, 2017 via email

Uh oh!

TomDLT commented Jul 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 5, 2017 via email

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

hongkahjun commented Jul 5, 2017

Uh oh!

jnothman Jul 5, 2017

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jul 5, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hongkahjun commented Jul 4, 2017 •

edited by TomDLT

Loading

hongkahjun commented Jul 5, 2017 •

edited

Loading

TomDLT commented Jul 5, 2017 •

edited

Loading