New run, June 2020 by erikbern · Pull Request #166 · erikbern/ann-benchmarks

erikbern · 2020-06-28T16:52:12Z

Running everything from scratch on a c5.4xlarge instance. Expecting it to take a few weeks. I'll push changes to this branch as I encounter anything it. Should be mostly pretty minor stuff!

maumueller · 2020-06-30T17:14:30Z

Exciting! :-) Please put the raw results into the s3 bucket.

It would be great if you could run it with 100-NN as well to see how the parameter settings generalize.

erikbern · 2020-06-30T19:47:05Z

I've been running it for about 3 days straight now, but it's super slow. Don't think I'm even through 1/3 of the Glove-100 benchmarks! I wonder if there's some way to speed it up. Was thinking maybe index building could happen in parallel for instance, although it might limit the amount of RAM available to each process.

maumueller · 2020-06-30T19:54:46Z

I also think that would save a lot of time. For my local runs I remove the cpu cap and track that the implementations carry out the queries single-threaded. They either do this anyways, or it can be enforced through some api calls as in the case of FAISS. I usually run a very limited set of algorithms, though. (Annoy, ngt, nmslib, faiss, puffinn)

Maybe we could have a separate PR to work this out. I can spend some time next week for doing that.

erikbern · 2020-07-01T02:34:32Z

I actually meant running multiple Docker containers at the same time. You could do it pretty easily by just using a thread pool for this loop: https://github.com/erikbern/ann-benchmarks/blob/master/ann_benchmarks/main.py#L211

An r6g.4xlarge is roughly the same cost as a c5.4xlarge but has 128GB memory not 32GB. And I think maybe 16GB is enough. So you could in theory run 8 docker instances simultaneously and limit each one to 1 CPU. That would complete the benchmarks 8x faster.

erikbern · 2020-07-01T02:51:38Z

Super quick prototype: #168

make it possible to run multiple containers in parallel

erikbern · 2020-07-02T13:21:53Z

... with the parallelism patch merged, I'm now re-running this for Glove with parallelism set to 4 and the number of runs bumped to 5. I'm expecting it to take about 2 days (rather than a full week).

I already set everything up on a c5.4xlarge but next time I run all algos, I'll use a higher-memory box so I can get parallelism up even more. The bottleneck is memory right now (32GB/4 = 8GB per container, which seems like a minimum).

EDIT: reduced parallelism to 3 since I saw a container crash due to (what I think) hitting memory limits

erikbern · 2020-07-04T17:02:37Z

glove-100-angular, results so far. this is after about 24h with parallelism 3. will replace with final ones shortly. scann doing well!

leovan · 2020-07-06T08:39:43Z

Excellent work! Besides would you mind share the source data for all result figures?

erikbern · 2020-07-06T18:18:35Z

Yes will share everything once it's done!

leovan · 2020-07-07T04:39:12Z

Thanks!

erikbern · 2020-07-09T19:11:37Z

The gist-960-euclidean dataset is taking forever to run since I can't run it with parallelism due to memory constraints. I've had it running for 48 hours but I'm like only 1/4 through the data. Going to leave it out of the benchmark this time. I'll re-run the benchmarks shortly on a machine with more RAM so I can increase the parallelism.

erikbern · 2020-07-13T02:26:25Z

accidentally pushed to master. will create a new pull request and un-merge merged changes

erikbern mentioned this pull request Jun 28, 2020

Attempt at adding Milvus #149

Merged

Ubuntu added 5 commits June 28, 2020 18:22

make install.py work with --proc again

5fa0585

misc changes for new June 2020 run

f9262f7

y log scale

d0eff19

remove python2 compatibility

60e31b5

Fix Milvus

423661f

erikbern force-pushed the new-run-june-2020 branch from 0d48344 to 423661f Compare June 29, 2020 03:45

make it possible to run multiple containers in parallel

0baf2c4

Erik Bernhardsson and others added 5 commits July 1, 2020 03:28

use processes instead of threads

2dff544

use starmap to remove the useless wrapper

610ab43

Run Docker container on each CPU

59d035b

Merge pull request #168 from erikbern/multiple-runs-in-parallel

a925c85

make it possible to run multiple containers in parallel

increase the default number of runs

9901c22

erikbern mentioned this pull request Jul 13, 2020

add lastfm dataset #91

Merged

erikbern merged commit 9901c22 into master Jul 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New run, June 2020#166

New run, June 2020#166
erikbern merged 11 commits intomasterfrom
new-run-june-2020

erikbern commented Jun 28, 2020

Uh oh!

maumueller commented Jun 30, 2020

Uh oh!

erikbern commented Jun 30, 2020

Uh oh!

maumueller commented Jun 30, 2020

Uh oh!

erikbern commented Jul 1, 2020

Uh oh!

erikbern commented Jul 1, 2020

Uh oh!

erikbern commented Jul 2, 2020 •

edited

Loading

Uh oh!

erikbern commented Jul 4, 2020 •

edited

Loading

Uh oh!

leovan commented Jul 6, 2020

Uh oh!

erikbern commented Jul 6, 2020

Uh oh!

leovan commented Jul 7, 2020

Uh oh!

erikbern commented Jul 9, 2020

Uh oh!

erikbern commented Jul 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

erikbern commented Jun 28, 2020

Uh oh!

maumueller commented Jun 30, 2020

Uh oh!

erikbern commented Jun 30, 2020

Uh oh!

maumueller commented Jun 30, 2020

Uh oh!

erikbern commented Jul 1, 2020

Uh oh!

erikbern commented Jul 1, 2020

Uh oh!

erikbern commented Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erikbern commented Jul 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leovan commented Jul 6, 2020

Uh oh!

erikbern commented Jul 6, 2020

Uh oh!

leovan commented Jul 7, 2020

Uh oh!

erikbern commented Jul 9, 2020

Uh oh!

erikbern commented Jul 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erikbern commented Jul 2, 2020 •

edited

Loading

erikbern commented Jul 4, 2020 •

edited

Loading