Skip to content

ENH vectorize labeled ranking average precision#3

Closed
jnothman wants to merge 7 commits intoarjoly:lrapfrom
jnothman:lrap
Closed

ENH vectorize labeled ranking average precision#3
jnothman wants to merge 7 commits intoarjoly:lrapfrom
jnothman:lrap

Conversation

@jnothman
Copy link
Copy Markdown

No description provided.

@jnothman
Copy link
Copy Markdown
Author

Sorry, got distracted by this. It would be nice not to have to use memory in n_features ** 2, but I haven't found a nice way to do so.

@arjoly
Copy link
Copy Markdown
Owner

arjoly commented Apr 24, 2014

Thanks a lot for your help!!

However, I fear that it will not scale with the number of n_labels.
With this benchmark script, I got the following timing results

n_labels = 500
-------------------------------------
original 1.83617901802s
vectorized 16.3870398998s 


n_labels = 750
-------------------------------------
original 2.10820603371s
vectorized 114.211375952s

Note that one of my applications have around 20000 samples and 4000 labels.

@jnothman
Copy link
Copy Markdown
Author

Okay. I thought this might be an issue. :(

I'll keep thinking about whether there's a linear memory solution.

On 25 April 2014 01:50, Arnaud Joly notifications@github.com wrote:

Thanks a lot for your help!!

However, I fear that it will not scale with the number of n_labels.
With this benchmark script https://gist.github.com/arjoly/11259383, I
got the following timing results

n_labels = 500

original 1.83617901802s
vectorized 16.3870398998s

n_labels = 750

original 2.10820603371s
vectorized 114.211375952s

Note that one of my applications have around 20000 samples and 4000 labels.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3#issuecomment-41296646
.

@jnothman
Copy link
Copy Markdown
Author

Btw, that benchmark is probably because in the n_labels=500 case the
vectorized version is allocating a 19GB array if my calculations are
correcgt.

On 25 April 2014 07:55, Joel Nothman jnothman@student.usyd.edu.au wrote:

Okay. I thought this might be an issue. :(

I'll keep thinking about whether there's a linear memory solution.

On 25 April 2014 01:50, Arnaud Joly notifications@github.com wrote:

Thanks a lot for your help!!

However, I fear that it will not scale with the number of n_labels.
With this benchmark script https://gist.github.com/arjoly/11259383, I
got the following timing results

n_labels = 500

original 1.83617901802s
vectorized 16.3870398998s

n_labels = 750

original 2.10820603371s
vectorized 114.211375952s

Note that one of my applications have around 20000 samples and 4000
labels.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3#issuecomment-41296646
.

@arjoly arjoly closed this Sep 1, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants