Conversation
0d48344 to
423661f
Compare
|
Exciting! :-) Please put the raw results into the s3 bucket. It would be great if you could run it with 100-NN as well to see how the parameter settings generalize. |
|
I've been running it for about 3 days straight now, but it's super slow. Don't think I'm even through 1/3 of the Glove-100 benchmarks! I wonder if there's some way to speed it up. Was thinking maybe index building could happen in parallel for instance, although it might limit the amount of RAM available to each process. |
|
I also think that would save a lot of time. For my local runs I remove the cpu cap and track that the implementations carry out the queries single-threaded. They either do this anyways, or it can be enforced through some api calls as in the case of FAISS. I usually run a very limited set of algorithms, though. (Annoy, ngt, nmslib, faiss, puffinn) Maybe we could have a separate PR to work this out. I can spend some time next week for doing that. |
|
I actually meant running multiple Docker containers at the same time. You could do it pretty easily by just using a thread pool for this loop: https://github.com/erikbern/ann-benchmarks/blob/master/ann_benchmarks/main.py#L211 An r6g.4xlarge is roughly the same cost as a c5.4xlarge but has 128GB memory not 32GB. And I think maybe 16GB is enough. So you could in theory run 8 docker instances simultaneously and limit each one to 1 CPU. That would complete the benchmarks 8x faster. |
|
Super quick prototype: #168 |
make it possible to run multiple containers in parallel
|
... with the parallelism patch merged, I'm now re-running this for Glove with parallelism set to 4 and the number of runs bumped to 5. I'm expecting it to take about 2 days (rather than a full week). I already set everything up on a c5.4xlarge but next time I run all algos, I'll use a higher-memory box so I can get parallelism up even more. The bottleneck is memory right now (32GB/4 = 8GB per container, which seems like a minimum). EDIT: reduced parallelism to 3 since I saw a container crash due to (what I think) hitting memory limits |
|
Excellent work! Besides would you mind share the source data for all result figures? |
|
Yes will share everything once it's done! |
|
Thanks! |
|
The |
|
accidentally pushed to master. will create a new pull request and un-merge merged changes |

Running everything from scratch on a c5.4xlarge instance. Expecting it to take a few weeks. I'll push changes to this branch as I encounter anything it. Should be mostly pretty minor stuff!