{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T16:03:34Z","timestamp":1776528214688,"version":"3.51.2"},"reference-count":20,"publisher":"MDPI AG","issue":"18","license":[{"start":{"date-parts":[[2021,9,13]],"date-time":"2021-09-13T00:00:00Z","timestamp":1631491200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Department of Sciences and Technology of the Xinjiang Production and Construction Corps, China","award":["2017DB005"],"award-info":[{"award-number":["2017DB005"]}]},{"name":"Key Technologies Research and Development Program of China","award":["2016YFD0300601"],"award-info":[{"award-number":["2016YFD0300601"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection plays an important role in autonomous driving, disaster rescue, robot navigation, intelligent video surveillance, and many other fields. Nonetheless, visible images are poor under weak illumination conditions, and thermal infrared images are noisy and have low resolution. Consequently, neither of these two data sources yields satisfactory results when used alone. While some scholars have combined visible and thermal images for object detection, most did not consider the illumination conditions and the different contributions of diverse data sources to the results. In addition, few studies have made use of the temperature characteristics of thermal images. Therefore, in the present study, visible and thermal images are utilized as the dataset, and RetinaNet is used as the baseline to fuse features from different data sources for object detection. Moreover, a dynamic weight fusion method, which is based on channel attention according to different illumination conditions, is used in the fusion component, and the channel attention and a priori temperature mask (CAPTM) module is proposed; the CAPTM can be applied to a deep learning network as a priori knowledge and maximizes the advantage of temperature information from thermal images. The main innovations of the present research include the following: (1) the consideration of different illumination conditions and the use of different fusion parameters for different conditions in the feature fusion of visible and thermal images; (2) the dynamic fusion of different data sources in the feature fusion of visible and thermal images; (3) the use of temperature information as a priori knowledge (CAPTM) in feature extraction. To a certain extent, the proposed methods improve the accuracy of object detection at night or under other weak illumination conditions and with a single data source. Compared with the state-of-the-art (SOTA) method, the proposed method is found to achieve superior detection accuracy with an overall mean average precision (mAP) improvement of 0.69%, including an AP improvement of 2.55% for the detection of the Person category. The results demonstrate the effectiveness of the research methods for object detection, especially temperature information-rich object detection.<\/jats:p>","DOI":"10.3390\/rs13183656","type":"journal-article","created":{"date-parts":[[2021,9,13]],"date-time":"2021-09-13T23:32:23Z","timestamp":1631575943000},"page":"3656","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":49,"title":["Visible-Thermal Image Object Detection via the Combination of Illumination Conditions and Temperature Information"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9177-3452","authenticated-orcid":false,"given":"Hang","family":"Zhou","sequence":"first","affiliation":[{"name":"Institute of Remote Sensing and GIS, Peking University, Beijing 100871, China"}]},{"given":"Min","family":"Sun","sequence":"additional","affiliation":[{"name":"Institute of Remote Sensing and GIS, Peking University, Beijing 100871, China"}]},{"given":"Xiang","family":"Ren","sequence":"additional","affiliation":[{"name":"Institute of Remote Sensing and GIS, Peking University, Beijing 100871, China"}]},{"given":"Xiuyuan","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Remote Sensing and GIS, Peking University, Beijing 100871, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,9,13]]},"reference":[{"key":"ref_1","first-page":"311","article-title":"The Status and Development Trend of Infrared Image Processing Technology","volume":"35","author":"Qian","year":"2013","journal-title":"Infrared Technol."},{"key":"ref_2","unstructured":"Choi, E.J., and Park, D.J. (December, January 30). Human detection using image fusion of thermal and visible image with new joint bilateral filter. Proceedings of the 5th International Conference on Computer Sciences and Convergence Information Technology, Seoul, Korea."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"354","DOI":"10.3724\/SP.J.1010.2011.00354","article-title":"Advances and perspective on motion detection fusion in visual and thermal framework","volume":"30","author":"Zhang","year":"2011","journal-title":"J. Infrared Millim. Waves"},{"key":"ref_4","unstructured":"Wagner, J., Fischer, V., Herman, M., and Behnke, S. (2016, January 27\u201329). Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks. Proceedings of the ESANN, Bruges, Belgium."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1179","DOI":"10.1049\/iet-cvi.2018.5315","article-title":"Multi-layer fusion techniques using a CNN for multispectral pedestrian detection","volume":"12","author":"Chen","year":"2018","journal-title":"IET Comput. Vis."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Liu, J., Zhang, S., Wang, S., and Metaxas, D.N. (2016). Multispectral deep neural networks for pedestrian detection. arXiv.","DOI":"10.5244\/C.30.73"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/j.patcog.2018.08.005","article-title":"Illumination-aware faster R-CNN for robust multispectral pedestrian detection","volume":"85","author":"Li","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_8","first-page":"91","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"28","author":"Ren","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1016\/j.inffus.2018.11.017","article-title":"Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection","volume":"50","author":"Guan","year":"2019","journal-title":"Inf. Fusion"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhao, Y., Cheng, J., Zhou, W., Zhang, C., and Pan, X. (2019, January 18\u201321). Infrared pedestrian detection with converted temperature map. Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China.","DOI":"10.1109\/APSIPAASC47483.2019.9023228"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Doll\u00e1r, P. (2017, January 22\u201329). Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.324"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lin, T.-Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_14","unstructured":"(2021, March 30). Free Flir Thermal Dataset for Algorithm Training. Available online: https:\/\/www.flir.com\/oem\/adas\/adas-dataset-form\/."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"103178","DOI":"10.1016\/j.infrared.2019.103178","article-title":"A fast RetinaNet fusion framework for multi-spectral pedestrian detection","volume":"105","author":"Pei","year":"2020","journal-title":"Infrared Phys. Technol."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Konig, D., Adam, M., Jarvers, C., Layher, G., Neumann, H., and Teutsch, M. (2017, January 21\u201326). Fully convolutional region proposal networks for multispectral person detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.36"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-excitation networks. Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Zhang, H., Fromont, E., Lefevre, S., and Avignon, B. (2020, January 25\u201328). Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks. Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/ICIP40778.2020.9191080"},{"key":"ref_20","unstructured":"Glorot, X., and Bengio, Y. (2010, January 13\u201315). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Chia Laguna Resort, Sardinia, Italy."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/18\/3656\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:01:56Z","timestamp":1760166116000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/13\/18\/3656"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,13]]},"references-count":20,"journal-issue":{"issue":"18","published-online":{"date-parts":[[2021,9]]}},"alternative-id":["rs13183656"],"URL":"https:\/\/doi.org\/10.3390\/rs13183656","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,13]]}}}