-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Power BA - Speedup for large number of images #2567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
(cherry picked from commit 948583f)
|
Thanks for the PR. I tried this a while ago but have not found a real speedup on my own datasets (up to 1000 images) while accuracy/completeness went down a bit compared to master. On which dataset did you try this? I am also not sure how to interpret your plots. Could you please clarify why after ~ 7 / 15 hours the curve sharply falls back to ~0 images? Is it because smaller components for remaining images are reconstructed later? Looking at these plots, it seems like that, on your dataset, the current thresholds in master, where we change from direct to indirect sparse after 1000 images may not be a good choice. Continuing with the direct sparse solver seems to be faster - even faster than the PowerBA approach? |
My own dataset out of historic photography. There are quite a few degenerate cameras so it wouldn't be suitable for accuracy benchmarking. I'm open to rerunning it using other datasets of similar size, any recommendations?
Yes, exactly.
The difference up to 1000 images is only caused by randomness, since I only enable PowerBA after that. The dataset is well behaved until around 500 images. PowerBA clearly shows a 2+x speedup, which could decrease depending on the accuracy threshold. |
|
Do you have any intuition about the parameters and how to choose them for better accuracy? I noticed that, with your proposed (i.e., Ceres' default) parameters, PowerBA always goes to the max 50 iterations without convergence. Many steps in-between are reported as invalid, which can be seen when setting |
|
@Dawars My experience of trying to find generally good options for PowerBA during integration of it into ceres-solver does align with @ahojnnes 's. Speedups that i succeeded to achieve were tightly coupled with specific dataset/options/stopping criterias and i failed to find a set of options that would've been good in general. A bit tangential to the topic of this PR, if you are interested in cheap speedups of colmap's global BA, is that currently there is a WIP to integrate a Nvidia's cudss (GPU accelerated sparse Cholesky solver) as a sparse direct backend into ceres-solver. It will allow to achieve quite significant speedups in direct mode. Also, colmap's criteria to switch from direct to iterative solver indeed might be tuned a bit (it might be a good start to move it into BA options struct to make it externally configurable and enable cheap experimentation). |
Bundle adjustments slows down as the number of images increases.
Currently the DENSE_SCHUR is used below 100 images, SPARSE_SCHUR below 1000 images and an iterative solver is used above that.
The Power Bundle Adjustment paper proposes an approximation based on the truncated power series expansion of the Schur matrix to speed up bundle adjustment for large scale SfM: https://arxiv.org/abs/2204.12834 by @simonwebertum @NikolausDemmel
This is implemented in ceres 2.2+ and enabling is very simple: http://ceres-solver.org/nnls_solving.html#schur-power-series-expansion
The hyperparameters might need some tuning, haven't benchmarked accuracy.
Also, if the accuracy is satisfactory, it might make sense to enable PowerBA for the 100+ images case as well.
My benchmark comparing the runtime to the current implementation:

I made 2 docker images for the experiment: dawars/colmap:torch2.3.0-cu11.8-powerba for PowerBA and dawars/colmap:torch2.3.0-cu11.8-baseline for the baseline.