Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching

Jing, Junpeng; Mao, Ye; Mikolajczyk, Krystian

doi:10.1007/978-3-031-73027-6_24

Junpeng Jing¹³,
Ye Mao¹³ &
Krystian Mikolajczyk¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15118))

Included in the following conference series:

European Conference on Computer Vision

693 Accesses
5 Citations

Abstract

Dynamic stereo matching is the task of estimating consistent disparities from stereo videos with dynamic objects. Recent learning-based methods prioritize optimal performance on a single stereo pair, resulting in temporal inconsistencies. Existing video methods apply per-frame matching and window-based cost aggregation across the time dimension, leading to low-frequency oscillations at the scale of the window size. Towards this challenge, we develop a bidirectional alignment mechanism for adjacent frames as a fundamental operation. We further propose a novel framework, BiDAStereo, that achieves consistent dynamic stereo matching. Unlike the existing methods, we model this task as local matching and global aggregation. Locally, we consider correlation in a triple-frame manner to pool information from adjacent frames and improve the temporal consistency. Globally, to exploit the entire sequence’s consistency and extract dynamic scene cues for aggregation, we develop a motion-propagation recurrent unit. Extensive experiments demonstrate the performance of our method, showcasing improvements in prediction quality and achieving SoTA results on commonly used benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Institutional subscriptions

Dynamic programming with adaptive and self-adjusting penalty for real-time accurate stereo matching

Article 08 November 2021

Semi-global Alignment of Range Videos

Disparity estimation in stereo video sequence with adaptive spatiotemporally consistent constraints

Article 19 December 2018

Notes

1.
\(\textrm{TEPE}(\textbf{d}, \textbf{d}_{\textrm{gt}})=\sqrt{\sum _{t=1}^{T-1}((\textbf{d}^{t} - \textbf{d}^{t+1}) - (\textbf{d}_{\textrm{gt}}^{t} - \textbf{d}_{\textrm{gt}}^{t+1}))^{2}} \).

References

Azuma, R.T.: A survey of augmented reality. Presence Teleoper. Virtual Environ. 6(4), 355–385 (1997)
Article Google Scholar
Bao, W., Wang, W., Xu, Y., Guo, Y., Hong, S., Zhang, X.: Instereo2k: a large real dataset for stereo matching in indoor scenes. SCIENCE CHINA Inf. Sci. 63(11), 1–11 (2020)
Article Google Scholar
Birchfield, S., Tomasi, C.: Depth discontinuities by pixel-to-pixel stereo. IJCV 35(3), 269–293 (1999)
Article Google Scholar
Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. In: BMVC, vol. 11, pp. 1–11 (2011)
Google Scholar
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE TPAMI 23(11), 1222–1239 (2001)
Article Google Scholar
Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: ECCV, pp. 611–625 (2012)
Google Scholar
Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: CVPR, pp. 5410–5418 (2018)
Google Scholar
Chang, T., Yang, X., Zhang, T., Wang, M.: Domain generalized stereo matching via hierarchical visual transformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9559–9568 (2023)
Google Scholar
Cheng, Z., Yang, J., Li, H.: Stereo matching in time: 100+ fps video stereo matching for extended reality. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 8719–8728 (2024)
Google Scholar
Deschaud, J.E.: Kitti-carla: a kitti-like dataset generated by carla simulator. arXiv preprint arXiv:2109.00892 (2021)
DeSouza, G.N., Kak, A.C.: Vision for mobile robot navigation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 237–267 (2002)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)
Google Scholar
Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: dense 3D reconstruction in real-time. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 963–968. IEEE (2011)
Google Scholar
Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: CVPR, pp. 3273–3282 (2019)
Google Scholar
Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)
Google Scholar
Hirschmüller, H., Innocent, P.R., Garibaldi, J.: Real-time correlation-based stereo vision with reduced border errors. IJCV 47(1), 229–246 (2002)
Article Google Scholar
Jing, J., et al.: Uncertainty guided adaptive warping for robust and efficient stereo matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3318–3327 (2023)
Google Scholar
Karaev, N., Rocco, I., Graham, B., Neverova, N., Vedaldi, A., Rupprecht, C.: Dynamicstereo: consistent dynamic depth from stereo videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13229–13239 (2023)
Google Scholar
Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: CVPR, pp. 66–75 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: ICPR, vol. 3, pp. 15–18 (2006)
Google Scholar
Li, J., et al.: Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16263–16272 (2022)
Google Scholar
Li, Z., et al.: Temporally consistent online depth estimation in dynamic scenes. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3018–3027 (2023)
Google Scholar
Lipson, L., Teed, Z., Deng, J.: Raft-stereo: multilevel recurrent field transforms for stereo matching. arXiv preprint arXiv:2109.07547 (2021)
Liu, H., et al.: Video super-resolution based on deep learning: a comprehensive survey. Artif. Intell. Rev. 55(8), 5981–6035 (2022)
Article Google Scholar
Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR, pp. 4040–4048 (2016)
Google Scholar
Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR, pp. 3061–3070 (2015)
Google Scholar
Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: CVPRW, pp. 887–895 (2017)
Google Scholar
Pang, J., et al.: Zoom and learn: generalizing deep stereo matching to novel domains. In: CVPR, pp. 2070–2079 (2018)
Google Scholar
Rao, Z., et al.: Masked representation learning for domain generalized stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5435–5444 (2023)
Google Scholar
Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: German Conference on Pattern Recognition, pp. 31–42 (2014)
Google Scholar
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1), 7–42 (2002)
Article Google Scholar
Schops, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR, pp. 3260–3269 (2017)
Google Scholar
Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: high-fidelity visual and physical simulation for autonomous vehicles. In: Field and Service Robotics (2017). https://arxiv.org/abs/1705.05065
Shen, Z., Dai, Y., Rao, Z.: Cfnet: cascade and fused cost volume for robust stereo matching. In: CVPR, pp. 13906–13915 (2021)
Google Scholar
Smith, L.N., Topin, N.: Super-convergence: very fast training of neural networks using large learning rates. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, vol. 11006, pp. 369–386. SPIE (2019)
Google Scholar
Song, X., Yang, G., Zhu, X., Zhou, H., Wang, Z., Shi, J.: Adastereo: a simple and efficient approach for adaptive stereo matching. In: CVPR, pp. 10328–10337 (2021)
Google Scholar
Sun, J., Zheng, N.N., Shum, H.Y.: Stereo matching using belief propagation. IEEE TPAMI 25(7), 787–800 (2003)
Article Google Scholar
Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S., Bouaziz, S.: Hitnet: hierarchical iterative tile refinement network for real-time stereo matching. In: CVPR, pp. 14362–14372 (2021)
Google Scholar
Teed, Z., Deng, J.: Raft: recurrent all-pairs field transforms for optical flow. In: ECCV, pp. 402–419 (2020)
Google Scholar
Teed, Z., Deng, J.: Raft-3D: scene flow using rigid-motion embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8375–8384 (2021)
Google Scholar
Tremblay, J., To, T., Birchfield, S.: Falling things: a synthetic dataset for 3D object detection and pose estimation. In: CVPRW, pp. 2038–2041 (2018)
Google Scholar
Van Meerbergen, G., Vergauwen, M., Pollefeys, M., Van Gool, L.: A hierarchical symmetric stereo algorithm using dynamic programming. IJCV 47(1), 275–285 (2002)
Article Google Scholar
Xu, G., Wang, X., Ding, X., Yang, X.: Iterative geometry encoding volume for stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21919–21928 (2023)
Google Scholar
Xu, G., Wang, Y., Cheng, J., Tang, J., Yang, X.: Accurate and efficient stereo matching via attention concatenation volume. arXiv preprint arXiv:2209.12699 (2022)
Xu, H., Zhang, J.: Aanet: adaptive aggregation network for efficient stereo matching. In: CVPR, pp. 1959–1968 (2020)
Google Scholar
Yang, G., Manela, J., Happold, M., Ramanan, D.: Hierarchical deep stereo matching on high-resolution images. In: CVPR, pp. 5515–5524 (2019)
Google Scholar
Yang, Q., Wang, L., Yang, R., Stewénius, H., Nistér, D.: Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling. IEEE TPAMI 31(3), 492–504 (2008)
Article Google Scholar
Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: Mvsnet: depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)
Google Scholar
Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: CVPR, pp. 1592–1599 (2015)
Google Scholar
Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: Ga-net: guided aggregation net for end-to-end stereo matching. In: CVPR, pp. 185–194 (2019)
Google Scholar
Zhang, Y., Poggi, M., Mattoccia, S.: Temporalstereo: efficient spatial-temporal stereo matching network. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9528–9535. IEEE (2023)
Google Scholar
Zhong, Y., Li, H., Dai, Y.: Open-world stereo video matching with deep RNN. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–116 (2018)
Google Scholar

Download references

Acknowledgments

This work was funded by the Imperial College-China Scholarship Council.

Author information

Authors and Affiliations

Imperial College London, London, UK
Junpeng Jing, Ye Mao & Krystian Mikolajczyk

Authors

Junpeng Jing
View author publications
Search author on:PubMed Google Scholar
Ye Mao
View author publications
Search author on:PubMed Google Scholar
Krystian Mikolajczyk
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Junpeng Jing .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jing, J., Mao, Y., Mikolajczyk, K. (2025). Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15118. Springer, Cham. https://doi.org/10.1007/978-3-031-73027-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-73027-6_24
Published: 26 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73026-9
Online ISBN: 978-3-031-73027-6
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics

Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching