Skip to content

New run, June 2020#166

Merged
erikbern merged 11 commits intomasterfrom
new-run-june-2020
Jul 13, 2020
Merged

New run, June 2020#166
erikbern merged 11 commits intomasterfrom
new-run-june-2020

Conversation

@erikbern
Copy link
Owner

Running everything from scratch on a c5.4xlarge instance. Expecting it to take a few weeks. I'll push changes to this branch as I encounter anything it. Should be mostly pretty minor stuff!

@erikbern erikbern force-pushed the new-run-june-2020 branch from 0d48344 to 423661f Compare June 29, 2020 03:45
@maumueller
Copy link
Collaborator

Exciting! :-) Please put the raw results into the s3 bucket.

It would be great if you could run it with 100-NN as well to see how the parameter settings generalize.

@erikbern
Copy link
Owner Author

I've been running it for about 3 days straight now, but it's super slow. Don't think I'm even through 1/3 of the Glove-100 benchmarks! I wonder if there's some way to speed it up. Was thinking maybe index building could happen in parallel for instance, although it might limit the amount of RAM available to each process.

@maumueller
Copy link
Collaborator

I also think that would save a lot of time. For my local runs I remove the cpu cap and track that the implementations carry out the queries single-threaded. They either do this anyways, or it can be enforced through some api calls as in the case of FAISS. I usually run a very limited set of algorithms, though. (Annoy, ngt, nmslib, faiss, puffinn)

Maybe we could have a separate PR to work this out. I can spend some time next week for doing that.

@erikbern
Copy link
Owner Author

erikbern commented Jul 1, 2020

I actually meant running multiple Docker containers at the same time. You could do it pretty easily by just using a thread pool for this loop: https://github.com/erikbern/ann-benchmarks/blob/master/ann_benchmarks/main.py#L211

An r6g.4xlarge is roughly the same cost as a c5.4xlarge but has 128GB memory not 32GB. And I think maybe 16GB is enough. So you could in theory run 8 docker instances simultaneously and limit each one to 1 CPU. That would complete the benchmarks 8x faster.

@erikbern
Copy link
Owner Author

erikbern commented Jul 1, 2020

Super quick prototype: #168

@erikbern
Copy link
Owner Author

erikbern commented Jul 2, 2020

... with the parallelism patch merged, I'm now re-running this for Glove with parallelism set to 4 and the number of runs bumped to 5. I'm expecting it to take about 2 days (rather than a full week).

I already set everything up on a c5.4xlarge but next time I run all algos, I'll use a higher-memory box so I can get parallelism up even more. The bottleneck is memory right now (32GB/4 = 8GB per container, which seems like a minimum).

EDIT: reduced parallelism to 3 since I saw a container crash due to (what I think) hitting memory limits

@erikbern
Copy link
Owner Author

erikbern commented Jul 4, 2020

glove-100-angular, results so far. this is after about 24h with parallelism 3. will replace with final ones shortly. scann doing well!

image

@leovan
Copy link

leovan commented Jul 6, 2020

Excellent work! Besides would you mind share the source data for all result figures?

@erikbern
Copy link
Owner Author

erikbern commented Jul 6, 2020

Yes will share everything once it's done!

@leovan
Copy link

leovan commented Jul 7, 2020

Thanks!

@erikbern
Copy link
Owner Author

erikbern commented Jul 9, 2020

The gist-960-euclidean dataset is taking forever to run since I can't run it with parallelism due to memory constraints. I've had it running for 48 hours but I'm like only 1/4 through the data. Going to leave it out of the benchmark this time. I'll re-run the benchmarks shortly on a machine with more RAM so I can increase the parallelism.

@erikbern erikbern mentioned this pull request Jul 13, 2020
@erikbern erikbern merged commit 9901c22 into master Jul 13, 2020
@erikbern
Copy link
Owner Author

accidentally pushed to master. will create a new pull request and un-merge merged changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants