Skip to main content

Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15118))

Included in the following conference series:

  • 693 Accesses

  • 5 Citations

Abstract

Dynamic stereo matching is the task of estimating consistent disparities from stereo videos with dynamic objects. Recent learning-based methods prioritize optimal performance on a single stereo pair, resulting in temporal inconsistencies. Existing video methods apply per-frame matching and window-based cost aggregation across the time dimension, leading to low-frequency oscillations at the scale of the window size. Towards this challenge, we develop a bidirectional alignment mechanism for adjacent frames as a fundamental operation. We further propose a novel framework, BiDAStereo, that achieves consistent dynamic stereo matching. Unlike the existing methods, we model this task as local matching and global aggregation. Locally, we consider correlation in a triple-frame manner to pool information from adjacent frames and improve the temporal consistency. Globally, to exploit the entire sequence’s consistency and extract dynamic scene cues for aggregation, we develop a motion-propagation recurrent unit. Extensive experiments demonstrate the performance of our method, showcasing improvements in prediction quality and achieving SoTA results on commonly used benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    \(\textrm{TEPE}(\textbf{d}, \textbf{d}_{\textrm{gt}})=\sqrt{\sum _{t=1}^{T-1}((\textbf{d}^{t} - \textbf{d}^{t+1}) - (\textbf{d}_{\textrm{gt}}^{t} - \textbf{d}_{\textrm{gt}}^{t+1}))^{2}} \).

References

  1. Azuma, R.T.: A survey of augmented reality. Presence Teleoper. Virtual Environ. 6(4), 355–385 (1997)

    Article  Google Scholar 

  2. Bao, W., Wang, W., Xu, Y., Guo, Y., Hong, S., Zhang, X.: Instereo2k: a large real dataset for stereo matching in indoor scenes. SCIENCE CHINA Inf. Sci. 63(11), 1–11 (2020)

    Article  Google Scholar 

  3. Birchfield, S., Tomasi, C.: Depth discontinuities by pixel-to-pixel stereo. IJCV 35(3), 269–293 (1999)

    Article  Google Scholar 

  4. Bleyer, M., Rhemann, C., Rother, C.: Patchmatch stereo-stereo matching with slanted support windows. In: BMVC, vol. 11, pp. 1–11 (2011)

    Google Scholar 

  5. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE TPAMI 23(11), 1222–1239 (2001)

    Article  Google Scholar 

  6. Butler, D.J., Wulff, J., Stanley, G.B., Black, M.J.: A naturalistic open source movie for optical flow evaluation. In: ECCV, pp. 611–625 (2012)

    Google Scholar 

  7. Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: CVPR, pp. 5410–5418 (2018)

    Google Scholar 

  8. Chang, T., Yang, X., Zhang, T., Wang, M.: Domain generalized stereo matching via hierarchical visual transformation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9559–9568 (2023)

    Google Scholar 

  9. Cheng, Z., Yang, J., Li, H.: Stereo matching in time: 100+ fps video stereo matching for extended reality. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 8719–8728 (2024)

    Google Scholar 

  10. Deschaud, J.E.: Kitti-carla: a kitti-like dataset generated by carla simulator. arXiv preprint arXiv:2109.00892 (2021)

  11. DeSouza, G.N., Kak, A.C.: Vision for mobile robot navigation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 237–267 (2002)

    Article  Google Scholar 

  12. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)

    Google Scholar 

  13. Geiger, A., Ziegler, J., Stiller, C.: Stereoscan: dense 3D reconstruction in real-time. In: 2011 IEEE Intelligent Vehicles Symposium (IV), pp. 963–968. IEEE (2011)

    Google Scholar 

  14. Guo, X., Yang, K., Yang, W., Wang, X., Li, H.: Group-wise correlation stereo network. In: CVPR, pp. 3273–3282 (2019)

    Google Scholar 

  15. Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  16. Hirschmüller, H., Innocent, P.R., Garibaldi, J.: Real-time correlation-based stereo vision with reduced border errors. IJCV 47(1), 229–246 (2002)

    Article  Google Scholar 

  17. Jing, J., et al.: Uncertainty guided adaptive warping for robust and efficient stereo matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3318–3327 (2023)

    Google Scholar 

  18. Karaev, N., Rocco, I., Graham, B., Neverova, N., Vedaldi, A., Rupprecht, C.: Dynamicstereo: consistent dynamic depth from stereo videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13229–13239 (2023)

    Google Scholar 

  19. Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: CVPR, pp. 66–75 (2017)

    Google Scholar 

  20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  21. Klaus, A., Sormann, M., Karner, K.: Segment-based stereo matching using belief propagation and a self-adapting dissimilarity measure. In: ICPR, vol. 3, pp. 15–18 (2006)

    Google Scholar 

  22. Li, J., et al.: Practical stereo matching via cascaded recurrent network with adaptive correlation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16263–16272 (2022)

    Google Scholar 

  23. Li, Z., et al.: Temporally consistent online depth estimation in dynamic scenes. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3018–3027 (2023)

    Google Scholar 

  24. Lipson, L., Teed, Z., Deng, J.: Raft-stereo: multilevel recurrent field transforms for stereo matching. arXiv preprint arXiv:2109.07547 (2021)

  25. Liu, H., et al.: Video super-resolution based on deep learning: a comprehensive survey. Artif. Intell. Rev. 55(8), 5981–6035 (2022)

    Article  Google Scholar 

  26. Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: CVPR, pp. 4040–4048 (2016)

    Google Scholar 

  27. Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: CVPR, pp. 3061–3070 (2015)

    Google Scholar 

  28. Pang, J., Sun, W., Ren, J.S., Yang, C., Yan, Q.: Cascade residual learning: a two-stage convolutional neural network for stereo matching. In: CVPRW, pp. 887–895 (2017)

    Google Scholar 

  29. Pang, J., et al.: Zoom and learn: generalizing deep stereo matching to novel domains. In: CVPR, pp. 2070–2079 (2018)

    Google Scholar 

  30. Rao, Z., et al.: Masked representation learning for domain generalized stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5435–5444 (2023)

    Google Scholar 

  31. Scharstein, D., et al.: High-resolution stereo datasets with subpixel-accurate ground truth. In: German Conference on Pattern Recognition, pp. 31–42 (2014)

    Google Scholar 

  32. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV 47(1), 7–42 (2002)

    Article  Google Scholar 

  33. Schops, T., et al.: A multi-view stereo benchmark with high-resolution images and multi-camera videos. In: CVPR, pp. 3260–3269 (2017)

    Google Scholar 

  34. Shah, S., Dey, D., Lovett, C., Kapoor, A.: Airsim: high-fidelity visual and physical simulation for autonomous vehicles. In: Field and Service Robotics (2017). https://arxiv.org/abs/1705.05065

  35. Shen, Z., Dai, Y., Rao, Z.: Cfnet: cascade and fused cost volume for robust stereo matching. In: CVPR, pp. 13906–13915 (2021)

    Google Scholar 

  36. Smith, L.N., Topin, N.: Super-convergence: very fast training of neural networks using large learning rates. In: Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, vol. 11006, pp. 369–386. SPIE (2019)

    Google Scholar 

  37. Song, X., Yang, G., Zhu, X., Zhou, H., Wang, Z., Shi, J.: Adastereo: a simple and efficient approach for adaptive stereo matching. In: CVPR, pp. 10328–10337 (2021)

    Google Scholar 

  38. Sun, J., Zheng, N.N., Shum, H.Y.: Stereo matching using belief propagation. IEEE TPAMI 25(7), 787–800 (2003)

    Article  Google Scholar 

  39. Tankovich, V., Hane, C., Zhang, Y., Kowdle, A., Fanello, S., Bouaziz, S.: Hitnet: hierarchical iterative tile refinement network for real-time stereo matching. In: CVPR, pp. 14362–14372 (2021)

    Google Scholar 

  40. Teed, Z., Deng, J.: Raft: recurrent all-pairs field transforms for optical flow. In: ECCV, pp. 402–419 (2020)

    Google Scholar 

  41. Teed, Z., Deng, J.: Raft-3D: scene flow using rigid-motion embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8375–8384 (2021)

    Google Scholar 

  42. Tremblay, J., To, T., Birchfield, S.: Falling things: a synthetic dataset for 3D object detection and pose estimation. In: CVPRW, pp. 2038–2041 (2018)

    Google Scholar 

  43. Van Meerbergen, G., Vergauwen, M., Pollefeys, M., Van Gool, L.: A hierarchical symmetric stereo algorithm using dynamic programming. IJCV 47(1), 275–285 (2002)

    Article  Google Scholar 

  44. Xu, G., Wang, X., Ding, X., Yang, X.: Iterative geometry encoding volume for stereo matching. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21919–21928 (2023)

    Google Scholar 

  45. Xu, G., Wang, Y., Cheng, J., Tang, J., Yang, X.: Accurate and efficient stereo matching via attention concatenation volume. arXiv preprint arXiv:2209.12699 (2022)

  46. Xu, H., Zhang, J.: Aanet: adaptive aggregation network for efficient stereo matching. In: CVPR, pp. 1959–1968 (2020)

    Google Scholar 

  47. Yang, G., Manela, J., Happold, M., Ramanan, D.: Hierarchical deep stereo matching on high-resolution images. In: CVPR, pp. 5515–5524 (2019)

    Google Scholar 

  48. Yang, Q., Wang, L., Yang, R., Stewénius, H., Nistér, D.: Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling. IEEE TPAMI 31(3), 492–504 (2008)

    Article  Google Scholar 

  49. Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: Mvsnet: depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)

    Google Scholar 

  50. Zbontar, J., LeCun, Y.: Computing the stereo matching cost with a convolutional neural network. In: CVPR, pp. 1592–1599 (2015)

    Google Scholar 

  51. Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: Ga-net: guided aggregation net for end-to-end stereo matching. In: CVPR, pp. 185–194 (2019)

    Google Scholar 

  52. Zhang, Y., Poggi, M., Mattoccia, S.: Temporalstereo: efficient spatial-temporal stereo matching network. In: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9528–9535. IEEE (2023)

    Google Scholar 

  53. Zhong, Y., Li, H., Dai, Y.: Open-world stereo video matching with deep RNN. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101–116 (2018)

    Google Scholar 

Download references

Acknowledgments

This work was funded by the Imperial College-China Scholarship Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junpeng Jing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jing, J., Mao, Y., Mikolajczyk, K. (2025). Match-Stereo-Videos: Bidirectional Alignment for Consistent Dynamic Stereo Matching. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15118. Springer, Cham. https://doi.org/10.1007/978-3-031-73027-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73027-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73026-9

  • Online ISBN: 978-3-031-73027-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics