Abstract
The perception system is the core component of self-driving cars, and a reliable perception system is a key condition for the secure and reliable operation of autonomous vehicles in various complex situations. To improve object detection performance in complex scenes, this study proposes a lightweight and efficient detection framework by integrating three key components. First, Receptive Field Attention Convolution (RFAConv) is introduced to optimize spatial feature representation within the receptive field, enhancing the model’s ability to focus on target regions and improving detection accuracy for small and cluttered objects. Second, the Top-K Sparse Kernel Attention (TKSA) mechanism selectively attends to the most relevant feature areas in the backbone, improving long-range perception while maintaining computational efficiency. Finally, Focaler-IoU is adopted as the bounding box regression loss to dynamically adjust the contribution of samples based on their IoU, improving stability and accelerating training convergence. Together, these enhancements significantly boost detection accuracy and robustness, particularly in challenging environments. Assessments performed on the KITTI datasets show that our detection algorithms achieve mAP@0.5 and mAP@0.5:0.95 of 83% and 55.4%, respectively. In comparison with the baseline model YOLOv11n, our approach delivers notable performance gains, with improvements of 5.2% and 3.9%.




Similar content being viewed by others
Data availability
No datasets were generated or analysed during the current study.
References
M. Milford, S. Anthony, W. Scheirer, Self-driving vehicles: Key technical challenges and progress off the road, IEEE Potentials, 39 (2019) 37-45. https://doi.org/10.1109/MPOT.2019.2939376
B. Mahaur, K. Mishra, A. Kumar, An improved lightweight small object detection framework applied to real-time autonomous driving, Expert Systems with Applications, 2023. https://doi.org/10.1016/j.eswa.2023.121036
Z. Song, L. Liu, F. Jia, Y. Luo, C. Jia, G. Zhang, L. Yang, L. Wang, Robustness-aware 3d object detection in autonomous driving: A review and outlook, IEEE Transactions on Intelligent Transportation Systems, 2024. https://doi.org/10.48550/arXiv.2401.06542
E. Alpaydin, C. Kaynak, Cascading classifiers, Kybernetika, 1998.
R. Girshick, Fast r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448. https://doi.org/10.1109/ICCV.2015.169
K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969. https://doi.org/10.1109/ICCV.2017.322
S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE transactions on pattern analysis machine intelligence, 2016. https://doi.org/10.1109/TPAMI.2016.2577031
J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788. https://doi.org/10.1109/CVPR.2016.91
J. Redmon, A. Farhadi, Yolov3: An incremental improvement, arXiv preprint arXiv:.02767, 2018. https://doi.org/10.48550/arXiv.1804.02767
C.-Y. Wang, A. Bochkovskiy, H.-Y.M. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7464-7475. https://doi.org/10.1109/CVPR52729.2023.00721
A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, Yolov10: Real-time end-to-end object detection, Advances in Neural Information Processing Systems, 2024. https://doi.org/10.48550/arXiv.2405.14458
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A.C. Berg, Ssd: Single shot multibox detector, in: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, 2016, pp. 21-37. https://doi.org/10.48550/arXiv.1512.02325
L. Xu, W. Yan, J. Ji, The research of a novel WOG-YOLO algorithm for autonomous driving object detection, Scientific reports, 2023. https://doi.org/10.1038/s41598-023-30409-1
M. Li, X. Liu, S. Chen, L. Yang, Q. Du, Z. Han, J. Wang, MST-YOLO: Small Object Detection Model for Autonomous Driving, Sensors, 2024. https://doi.org/10.3390/s24227347
R. Chaudhry, SD-YOLO-AWDNet: A hybrid approach for smart object detection in challenging weather for self-driving cars, Expert Systems with Applications, 2024. https://doi.org/10.1016/j.eswa.2024.124942
L. Wang, S. Hua, C. Zhang, G. Yang, J. Ren, J. Li, YOLOdrive: A Lightweight Autonomous Driving Single-Stage Target Detection Approach, IEEE Internet of Things Journal, 2024. https://doi.org/10.1109/JIOT.2024.3439863
B. Yu, Z. Li, Y. Cao, C. Wu, J. Qi, L. Wu, YOLO-MPAM: Efficient real-time neural networks based on multi-channel feature fusion, Expert Systems with Applications, 2024. https://doi.org/10.1016/j.eswa.2024.124282
J. Ren, J. Yang, W. Zhang, K. Cai, RBS-YOLO: A vehicle detection algorithm based on multi-scale feature extraction, Signal, Image Video Processing, 2024. https://doi.org/10.1007/s11760-024-03007-5
T. Xue, Z. Liu, S. Lan, Q. Zhang, A. Yang, J. Li, YOLO-FSE: An Improved Target Detection Algorithm for Vehicles in Autonomous Driving, IEEE Internet of Things Journal, 2025. https://doi.org/10.1109/JIOT.2025.3526224
Q. Fan, Y. Li, M. Deveci, K. Zhong, S. Kadry, LUD-YOLO: A novel lightweight object detection network for unmanned aerial vehicle, Information Sciences, 2025. https://doi.org/10.1016/j.ins.2024.121366
H. Wang, J. Liu, J. Zhao, J. Zhang, D. Zhao, Precision and speed: LSOD-YOLO for lightweight small object detection, Expert Systems with Applications, 2025. https://doi.org/10.1016/j.eswa.2025.126440
R. Khanam, M. Hussain, Yolov11: An overview of the key architectural enhancements, arXiv preprint arXiv:.17725, 2024. https://doi.org/10.48550/arXiv.2410.17725
X. Zhang, C. Liu, D. Yang, T. Song, Y. Ye, K. Li, Y. Song, RFAConv: Innovating spatial attention and standard convolutional operation, arXiv preprint arXiv:.03198, 2023. https://doi.org/10.48550/arXiv.2304.03198
X. Chen, H. Li, M. Li, J. Pan, Learning a sparse transformer network for effective image deraining, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 5896-5905. https://doi.org/10.1109/CVPR52729.2023.00571
H. Zhang, S. Zhang, Focaler-iou: More focused intersection over union loss, arXiv preprint arXiv:.10525, 2024. https://doi.org/10.48550/arXiv.2401.10525
A. Geiger, P. Lenz, C. Stiller, R. Urtasun, Vision meets robotics: The kitti dataset, The international journal of robotics research, 2013. http://dx.doi.org/10.1177/0278364913491297
F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, T. Darrell, Bdd100k: A diverse driving dataset for heterogeneous multitask learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2636-2645. https://doi.org/10.1109/CVPR42600.2020.00271
C. Wang, W. He, Y. Nie, J. Guo, C. Liu, Y. Wang, K. Han, Gold-YOLO: Efficient object detector via gather-and-distribute mechanism, Advances in Neural Information Processing Systems, 2023. https://doi.org/10.48550/arXiv.2309.11331
Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, J. Chen, Detrs beat yolos on real-time object detection, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 16965-16974. https://doi.org/10.1109/CVPR52733.2024.01605
Author information
Authors and Affiliations
Contributions
Author 1 was responsible for the project design, manuscript review, and editing; Author 2 contributed key conceptual ideas, methodology, analysis of experimental data, and manuscript writing; Author 3 handled data acquisition and the review of relevant literature; Author 4 was in charge of software environment configuration and the review and validation of experimental results; Author 5 organized the experimental data and created the Tables 1, 2, 3, 4, 5 and 6; Author 6 organized the experimental data and prepared Figs. 1, 2, 3 and 4. All authors reviewed and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Y., Zhen, M., Ma, Y. et al. RTF-YOLO: an efficient object detection framework for autonomous vehicles. SIViP 19, 993 (2025). https://doi.org/10.1007/s11760-025-04585-8
Received:
Revised:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1007/s11760-025-04585-8

