{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,30]],"date-time":"2025-12-30T17:55:13Z","timestamp":1767117313287,"version":"build-2065373602"},"reference-count":44,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2021,5,17]],"date-time":"2021-05-17T00:00:00Z","timestamp":1621209600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Crowd counting is a challenging task due to large perspective, density, and scale variations. CNN-based crowd counting techniques have achieved significant performance in sparse to dense environments. However, crowd counting in high perspective-varying scenes (images) is getting harder due to different density levels occupied by the same number of pixels. In this way large variations for objects in the same spatial area make it difficult to count accurately. Further, existing CNN-based crowd counting methods are used to extract rich deep features; however, these features are used locally and disseminated while propagating through intermediate layers. This results in high counting errors, especially in dense and high perspective-variation scenes. Further, class-specific responses along channel dimensions are underestimated. To address these above mentioned issues, we therefore propose a CNN-based dense feature extraction network for accurate crowd counting. Our proposed model comprises three main modules: (1) backbone network, (2) dense feature extraction modules (DFEMs), and (3) channel attention module (CAM). The backbone network is used to obtain general features with strong transfer learning ability. The DFEM is composed of multiple sub-modules called dense stacked convolution modules (DSCMs), densely connected with each other. In this way features extracted from lower and middle-lower layers are propagated to higher layers through dense connections. In addition, combinations of task independent general features obtained by the former modules and task-specific features obtained by later ones are incorporated to obtain high counting accuracy in large perspective-varying scenes. Further, to exploit the class-specific response between background and foreground, CAM is incorporated at the end to obtain high-level features along channel dimensions for better counting accuracy. Moreover, we have evaluated the proposed method on three well known datasets: Shanghaitech (Part-A), Shanghaitech (Part-B), and Venice. The performance of the proposed technique justifies its relative effectiveness in terms of selected performance compared to state-of-the-art techniques.<\/jats:p>","DOI":"10.3390\/s21103483","type":"journal-article","created":{"date-parts":[[2021,5,17]],"date-time":"2021-05-17T12:19:57Z","timestamp":1621253997000},"page":"3483","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["HADF-Crowd: A Hierarchical Attention-Based Dense Feature Extraction Network for Single-Image Crowd Counting"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8220-4474","authenticated-orcid":false,"given":"Naveed","family":"Ilyas","sequence":"first","affiliation":[{"name":"Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7233-5833","authenticated-orcid":false,"given":"Boreom","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Biomedical Science and Engineering, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Korea"}]},{"given":"Kiseon","family":"Kim","sequence":"additional","affiliation":[{"name":"School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Korea"}]}],"member":"1968","published-online":{"date-parts":[[2021,5,17]]},"reference":[{"key":"ref_1","first-page":"83","article-title":"World population in 2050: Assessing the projections","volume":"Volume 46","author":"Cohen","year":"1998","journal-title":"Conference Series-Federal Reserve Bank of Boston"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"182050","DOI":"10.1109\/ACCESS.2019.2960292","article-title":"CASA-Crowd: A Context-Aware Scale Aggregation CNN-Based Crowd Counting Technique","volume":"7","author":"Ilyas","year":"2019","journal-title":"IEEE Access"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ilyas, N., Shahzad, A., and Kim, K. (2020). Convolutional-Neural Network-Based Image Crowd Counting: Review, Categorization, Analysis, and Performance Evaluation. Sensors, 20.","DOI":"10.3390\/s20010043"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, D., Chen, S., Gao, S., and Ma, Y. (2016, January 27\u201330). Single-image crowd counting via multi-column convolutional neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.70"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Sam, D.B., Surya, S., and Babu, R.V. (2017, January 21\u201326). Switching convolutional neural network for crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.429"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Li, Y., Zhang, X., and Chen, D. (2018, January 18\u201323). Csrnet: Dilated convolutional neural networks for understanding the highly congested scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00120"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Liu, W., Lis, K., Salzmann, M., and Fua, P. (2018). Geometric and Physical Constraints for Head Plane Crowd Density Estimation in Videos. arXiv.","DOI":"10.1109\/IROS40897.2019.8967852"},{"key":"ref_8","unstructured":"Huang, S., Li, X., Cheng, Z.Q., Zhang, Z., and Hauptmann, A. (2018). Stacked Pooling: Improving Crowd Counting by Boosting Scale Invariance. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"4953","DOI":"10.1109\/TII.2018.2852481","article-title":"Multiscale Multitask Deep NetVLAD for Crowd Counting","volume":"14","author":"Shi","year":"2018","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_10","unstructured":"Wang, L., Shao, W., Lu, Y., Ye, H., Pu, J., and Zheng, Y. (2018). Crowd Counting with Density Adaption Networks. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Zhou, C., Chang, F., and Kot, A.C. (2018). Attention to Head Locations for Crowd Counting. arXiv.","DOI":"10.1007\/978-3-030-34110-7_61"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Liu, L., Wang, H., Li, G., Ouyang, W., and Lin, L. (2018). Crowd Counting using Deep Recurrent Spatial-Aware Network. arXiv.","DOI":"10.24963\/ijcai.2018\/118"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.1109\/TIP.2017.2740160","article-title":"Body structure aware deep crowd counting","volume":"27","author":"Huang","year":"2018","journal-title":"IEEE Trans. Image Process."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1016\/j.image.2018.03.004","article-title":"Counting challenging crowds robustly using a multi-column multi-task convolutional neural network","volume":"64","author":"Yang","year":"2018","journal-title":"Signal Process. Image Commun."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Liu, X., van de Weijer, J., and Bagdanov, A.D. (2018, January 18\u201323). Leveraging unlabeled data for crowd counting by learning to rank. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00799"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhu, J., Feng, F., and Shen, B. (2018, January 18\u201320). People counting and pedestrian flow statistics based on convolutional neural network and recurrent neural network. Proceedings of the 2018 33rd Youth Academic Annual Conference of Chinese Association of Automation (YAC), Nanjing, China.","DOI":"10.1109\/YAC.2018.8406516"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wan, J., and Chan, A. (2019, January 27\u201328). Adaptive density map generation for crowd counting. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00122"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Qiu, Z., Liu, L., Li, G., Wang, Q., Xiao, N., and Lin, L. (2019, January 8\u201312). Crowd counting via multi-view scale aggregation networks. Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China.","DOI":"10.1109\/ICME.2019.00259"},{"key":"ref_19","unstructured":"Zhang, A., Yue, L., Shen, J., Zhu, F., Zhen, X., Cao, X., and Shao, L. (November, January 27). Attentional neural fields for crowd counting. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_20","unstructured":"Tian, Y., Lei, Y., Zhang, J., and Wang, J.Z. (2018). PaDNet: Pan-Density Crowd Counting. arXiv."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., Yasarla, R., and Patel, V.M. (2019, January 27\u201328). Pushing the frontiers of unconstrained crowd counting: New dataset and benchmark method. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00131"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., and Patel, V.M. (2019, January 27\u201328). Multi-level bottom-top and top-bottom feature fusion for crowd counting. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00109"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Jiang, X., Xiao, Z., Zhang, B., Zhen, X., Cao, X., Doermann, D., and Shao, L. (2019). Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network. arXiv.","DOI":"10.1109\/CVPR.2019.00629"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"112977","DOI":"10.1016\/j.eswa.2019.112977","article-title":"DSPNet: Deep scale purifier network for dense crowd counting","volume":"141","author":"Zeng","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_25","unstructured":"Yosinski, J., Clune, J., Bengio, Y., and Lipson, H. (2014). How transferable are features in deep neural networks?. arXiv."},{"key":"ref_26","unstructured":"Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Yang, M., Yu, K., Zhang, C., Li, Z., and Yang, K. (2018, January 18\u201323). Denseaspp for semantic segmentation in street scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00388"},{"key":"ref_28","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019). Pytorch: An imperative style, high-performance deep learning library. arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Marsden, M., McGuinness, K., Little, S., and O\u2019Connor, N.E. (2016). Fully convolutional crowd counting on highly congested scenes. arXiv.","DOI":"10.5220\/0006097300270033"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., and Patel, V.M. (September, January 29). Cnn-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. Proceedings of the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, Lecce, Italy.","DOI":"10.1109\/AVSS.2017.8078491"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, L., Shi, M., and Chen, Q. (2018, January 12\u201315). Crowd counting via scale-adaptive convolutional neural network. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00127"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"19945","DOI":"10.1007\/s11042-019-7377-y","article-title":"Multi-scale dilated convolution of convolutional neural network for image denoising","volume":"78","author":"Wang","year":"2019","journal-title":"Multimed. Tools Appl."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., and Patel, V.M. (2017, January 22\u201329). Generating high-quality crowd density maps using contextual pyramid cnns. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.206"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Shen, Z., Xu, Y., Ni, B., Wang, M., Hu, J., and Yang, X. (2018, January 18\u201323). Crowd counting via adversarial cross-scale consistency pursuit. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00550"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Shi, Z., Zhang, L., Liu, Y., Cao, X., Ye, Y., Cheng, M.M., and Zheng, G. (2018, January 18\u201323). Crowd counting with deep negative correlation learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00564"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Babu Sam, D., Sajjan, N.N., Venkatesh Babu, R., and Srinivasan, M. (2018, January 18\u201323). Divide and grow: Capturing huge diversity in crowd images with incrementally growing cnn. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00381"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"566","DOI":"10.1109\/TII.2019.2935244","article-title":"Cross-Level Parallel Network for Crowd Counting","volume":"16","author":"Li","year":"2019","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_38","unstructured":"Wang, Z., Xiao, Z., Xie, K., Qiu, Q., Zhen, X., and Cao, X. (2018). In defense of single-column networks for crowd counting. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Ranjan, V., Le, H., and Hoai, M. (2018, January 8\u201314). Iterative crowd counting. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_17"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Liu, J., Gao, C., Meng, D., and Hauptmann, A.G. (2018, January 18\u201323). Decidenet: Counting varying density crowds through attention guided detection and density estimation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00545"},{"key":"ref_41","unstructured":"Ilyas, N., Najarro, A.C., and Kim, K. (2020). DFE-Crowd: Dense Feature Extraction for Single Image Crowd Counting, Korean Communication Society."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Sindagi, V.A., and Patel, V.M. (2019, January 18\u201321). Inverse attention guided deep crowd counting network. Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan.","DOI":"10.1109\/AVSS.2019.8909889"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhang, A., Shen, J., Xiao, Z., Zhu, F., Zhen, X., Cao, X., and Shao, L. (2019, January 27\u201328). Relational attention network for crowd counting. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00689"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Liu, W., Salzmann, M., and Fua, P. (2019, January 15\u201320). Context-aware crowd counting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00524"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/10\/3483\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:02:38Z","timestamp":1760162558000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/10\/3483"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,17]]},"references-count":44,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2021,5]]}},"alternative-id":["s21103483"],"URL":"https:\/\/doi.org\/10.3390\/s21103483","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,5,17]]}}}