{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T11:08:25Z","timestamp":1767611305946,"version":"build-2065373602"},"reference-count":32,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2022,5,23]],"date-time":"2022-05-23T00:00:00Z","timestamp":1653264000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62162059","12061072","2018YFC0825504"],"award-info":[{"award-number":["62162059","12061072","2018YFC0825504"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key R &amp; D plan project","award":["62162059","12061072","2018YFC0825504"],"award-info":[{"award-number":["62162059","12061072","2018YFC0825504"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Three-dimensional object detection in the point cloud can provide more accurate object data for autonomous driving. In this paper, we propose a method named MA-MFFC that uses an attention mechanism and a multi-scale feature fusion network with ConvNeXt module to improve the accuracy of object detection. The multi-attention (MA) module contains point-channel attention and voxel attention, which are used in voxelization and 3D backbone. By considering the point-wise and channel-wise, the attention mechanism enhances the information of key points in voxels, suppresses background point clouds in voxelization, and improves the robustness of the network. The voxel attention module is used in the 3D backbone to obtain more robust and discriminative voxel features. The MFFC module contains the multi-scale feature fusion network and the ConvNeXt module; the multi-scale feature fusion network can extract rich feature information and improve the detection accuracy, and the convolutional layer is replaced with the ConvNeXt module to enhance the feature extraction capability of the network. The experimental results show that the average accuracy is 64.60% for pedestrians and 80.92% for cyclists on the KITTI dataset, which is 1.33% and 2.1% higher, respectively, compared with the baseline network, enabling more accurate detection and localization of more difficult objects.<\/jats:p>","DOI":"10.3390\/s22103935","type":"journal-article","created":{"date-parts":[[2022,5,24]],"date-time":"2022-05-24T03:16:55Z","timestamp":1653362215000},"page":"3935","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["3D Object Detection Based on Attention and Multi-Scale Feature Fusion"],"prefix":"10.3390","volume":"22","author":[{"given":"Minghui","family":"Liu","sequence":"first","affiliation":[{"name":"School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0020-6856","authenticated-orcid":false,"given":"Jinming","family":"Ma","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China"}]},{"given":"Qiuping","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China"}]},{"given":"Yuchen","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China"}]},{"given":"Gang","family":"Shi","sequence":"additional","affiliation":[{"name":"School of Information Science and Engineering, Xinjiang University, Urumqi 830046, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"4338","DOI":"10.1109\/TPAMI.2020.3005434","article-title":"Deep learning for 3D point clouds: A survey","volume":"43","author":"Guo","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Shi, S., Wang, X., and Li, H. (2019, January 15\u201320). Pointrcnn: 3D object proposal generation and detection from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00086"},{"key":"ref_3","unstructured":"Qi, C.R., Su, H., Mo, K., and Guibas, L.J. (2017, January 21\u201326). Pointnet: Deep learning on point sets for 3D classification and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA."},{"key":"ref_4","unstructured":"Qi, C.R., Yi, L., Su, H., and Guibas, L.J. (2017, January 4\u20139). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhou, Y., and Tuzel, O. (2018, January 18\u201323). Voxelnet: End-to-end learning for point cloud based 3D object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00472"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., and Beijbom, O. (2019, January 15\u201320). Pointpillars: Fast encoders for object detection from point clouds. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01298"},{"key":"ref_7","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.Y., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., and Xie, S. (2022). A ConvNet for the 2020s. arXiv.","DOI":"10.1109\/CVPR52688.2022.01167"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Chen, X., Ma, H., Wan, J., Li, B., and Xia, T. (2017, January 21\u201326). Multi-view 3D object detection network for autonomous driving. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.691"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Qi, C.R., Liu, W., Wu, C., Su, H., and Guibas, L.J. (2018, January 18\u201323). Frustum pointnets for 3D object detection from rgb-d data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00102"},{"key":"ref_13","first-page":"2647","article-title":"From points to parts: 3D object detection from point cloud with part-aware and part-aggregation network","volume":"43","author":"Shi","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_14","unstructured":"Qi, C.R., Litany, O., He, K., and Guibas, L.J. (November, January 27). Deep hough voting for 3D object detection in point clouds. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, Z., Wang, F., and Wang, N. (2021, January 20\u201325). Lidar r-cnn: An efficient and universal 3D object detector. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00746"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yan, Y., Mao, Y., and Li, B. (2018). Second: Sparsely embedded convolutional detection. Sensors, 18.","DOI":"10.3390\/s18103337"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Shi, S., Guo, C., Jiang, L., Wang, Z., Shi, J., Wang, X., and Li, H. (2020, January 13\u201319). Pv-rcnn: Point-voxel feature set abstraction for 3D object detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01054"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Deng, J., Shi, S., Li, P., Zhou, W., Zhang, Y., and Li, H. (2021, January 2\u20139). Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection. Proceedings of the AAAI Conference on Artificial Intelligence, Virtually.","DOI":"10.1609\/aaai.v35i2.16207"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Yin, T., Zhou, X., and Krahenbuhl, P. (2021, January 20\u201325). Center-based 3D object detection and tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01161"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Zheng, W., Tang, W., Jiang, L., and Fu, C.W. (2021, January 20\u201325). SE-SSD: Self-ensembling single-stage object detector from point cloud. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01426"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Xu, Q., Zhong, Y., and Neumann, U. (2021). Behind the Curtain: Learning Occluded Shapes for 3D Object Detection. arXiv.","DOI":"10.1609\/aaai.v36i3.20194"},{"key":"ref_22","unstructured":"Hendrycks, D., and Gimpel, K. (2016). Bridging nonlinearities and stochastic regularizers with Gaussian error linear units. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201323). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_27","unstructured":"Qin, Z., Li, Z., Zhang, Z., Bao, Y., Yu, G., Peng, Y., and Sun, J. (November, January 27). ThunderNet: Towards real-time generic object detection on mobile devices. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Geiger, A., Lenz, P., and Urtasun, R. (2012, January 16\u201321). Are we ready for autonomous driving? The kitti vision benchmark suite. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248074"},{"key":"ref_30","unstructured":"Loshchilov, I., and Hutter, F. (2017). Fixing weight decay regularization in adam. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Smith, L.N. (2017, January 24\u201331). Cyclical learning rates for training neural networks. Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA.","DOI":"10.1109\/WACV.2017.58"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Liu, Z., Zhao, X., Huang, T., Hu, R., Zhou, Y., and Bai, X. (2020, January 7\u201312). Tanet: Robust 3D object detection from point clouds with triple attention. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6837"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/10\/3935\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:16:43Z","timestamp":1760138203000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/10\/3935"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,23]]},"references-count":32,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2022,5]]}},"alternative-id":["s22103935"],"URL":"https:\/\/doi.org\/10.3390\/s22103935","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,5,23]]}}}