-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-9224][MLlib]OnlineLDA Performance Improvements #7454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9224][MLlib]OnlineLDA Performance Improvements #7454
Conversation
|
Test build #37550 has finished for PR 7454 at commit
|
|
Test build #37553 has finished for PR 7454 at commit
|
|
Test build #37610 has finished for PR 7454 at commit
|
|
I'll make a pass. Can you please make a JIRA for this and put it in the title? Also, can you please test this to verify the speedups? It sounds like local tests could suffice, based on the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a val since you use Breeze to mutate the internals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fixed.
|
(maybe not; see below first) I ran: and got the exception: |
|
I'm wondering if it's a mis-matched shape issue. |
|
Ohh, actually, it might be from me trying to stats...which might be some weird Breeze object which does not implement toString properly. Let me retry |
|
Hm, no, I think something is wrong. Can you try running the example as I wrote above? |
|
I played with it this morning. The bugs were occurring because Since |
|
Oh, I see. Thanks for investigating! In my example, the numbers of terms is limited to 10 (so I could print the topics), probably making some documents empty. This LGTM pending tests, but can you please make a starter JIRA for adding a unit test which tests online LDA with empty documents? You may need to note in it that only SparseVectors can be empty. Thanks. |
|
Ran some local perf tests. Training Time (sec):
Training Time (sec):
|
|
Test build #37968 has finished for PR 7454 at commit
|
|
Merging with master. Thanks! |
In-place updates, reduce number of transposes, and vectorize operations in OnlineLDA implementation.